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ABSTRACT 


Human activity recognition is an important area of machine learning research 
as it has much utilization in different areas such as sports training, security, 
entertainment, ambient-assisted living, and health monitoring and 
management. Studying human activity recognition shows that researchers are 
interested mostly in the daily activities of the human. Nowadays mobile phone 
is well equipped with advanced processor, more memory, powerful battery 
and built-in sensors. This provides an opportunity to open up new areas of 
data mining for activity recognition of human’s daily living. In the paper, the 
benchmark dataset is considered for this work is acquired from the WISDM 
laboratory, which is available in public domain. We tested experiment using 
AdaBoost.Ml algorithm with Decision Stump, Hoeffding Tree, Random Tree, 
J48, Random Forest and REP Tree to classify six activities of daily life by using 
Weka tool. Then we also see the test output from weka experimenter for these 
six classifiers. We found the using Adaboost,M1 with Random Forest, J.48 and 
REP Tree improves overall accuracy. We showed that the difference in 
accuracy for Random Forest, REP Tree and J48 algorithms compared to 
Decision Stump, and Hoeffding Tree is statistically significant. We also show 
that the accuracy of these algorithms compared to Decision Stump, and 
Hoeffding Tree is high, so we can say that these two algorithms achieved a 
statistically significantly better result than the Decision Stump, and Hoeffding 
Tree and Random Tree baseline. 
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1. INTRODUCTION 

Human Activity recognition [HAR] is the root of many 
applications, such as those which deal with personal 
biometric signature, advanced computing, health and fitness 
monitoring, and elder-care, etc. The input of HAR models is 
the reading of the raw sensor data and the output is the 
prediction of the user's motion activities. The HAR system 
becomes an emerging discipline in the area of pervasive 
computing in the intelligent computing applications. 
According to the World Health Organization (WHO], the 
number of diabetic patients among the world population 
drastically increases from time to time (WHO, 2016], In the 
world, the first time it is happening that the proportion of 
older persons (60 years or older] increases in the proportion 
of young (below 15], For the first time in history, the number 
of older persons in the world will exceed the number of 
young by year 2050. Such ageing population need care. 
Activity recognition is a significant research area can provide 
a solution to such problem. This area has many applications 
in healthcare, elder care, user interfaces, smart 
environments, and security. Image and video based human 
activity recognition has been studied since a long time but 
they have limitation of mostly require infrastructure 
support, for example, the installation of video cameras in the 
monitoring areas. There are alternative approaches are 
available such as a body worn sensors or a smart phone 
which have built-in sensors to recognize the human activity 
of daily living. But a normal human can't wear so many 
sensors on the body excluding a patient. Today's smartphone 
is well equipped with powerful sensors and long lasting 


battery with small in size provides an opportunity for data 
mining research and applications in human activity 
recognition using mobile phones. Some existing works have 
explored human activity recognition using data from 
accelerometer sensors. Many researches received very good 
accuracy by using tri-axial accelerometer for activity 
recognition the daily. 

2. Sensor approaches 

There are two types of sensors to recognize the human 
activities; using external or wearable sensors. In the past, the 
sensors were settled in predetermined points of interest, 
therefore the detecting of activities is essentially based on 
the interaction of the users with the sensors. One of the 
examples of external sensors applications is the intelligent 
home, which has a capability to identify the complicated 
activities, eating, taking a shower, washing dishes, etc., 
because they depend on data that is collected from various 
sensors which are placed in specific objects. Those objects 
are supported by peoples’ interaction with them (e.g., stove, 
faucet, washing machine, etc.]. However, there is no useful 
response if the user is out of the sensor area or the activities 
of the user do not need to interact with those objects. 
Moreover, the composition and servicing of sensors require 
high costs. 

Also, some of the extensive researches have been focused on 
the recognition of activities and gestures from video 
sequences. This is most appropriate for security and 
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interactive applications. Microsoft developed the Kinect 
game console that let the user interact with the game using 
the gestures without any controller devices. However, there 
are some issues in video sequences of HAR such as: 

> The privacy, as no one wants to be always monitored 
and recorded by cameras. 

> The pervasiveness, it is difficult to attach the video 
recording devices to the target of individuals in order to 
collect the images of their entire body during daily living 
activities. 

> Video processing techniques are comparatively costly 
and consuming time. 

The above-mentioned limitations motivate to use a wearable 
sensor in HAR. Where the measured attributes almost 
depend on the following: environmental variables (such as 
temperature and humidity], movement of the user (such as 
using GPS or accelerometers], or physiological signals (such 
as heart rate or electrocardiogram]. These data are indexed 
over the time dimension. 

Accelerometer sensors sense the acceleration event from 
mobile phone, WII remote or wearable sensors. The raw data 
stream from the accelerometer is the acceleration of each 
axis in the units of g-force. The raw data is represented in a 
set of 3D space vectors of acceleration. A time stamp can also 
be returned together with the three axes readings. Most of 
the existing accelerometers provide a user interface to 
configure the sampling frequency so that the user have to 
choose the best sampling rate which match his needs. There 
are many causes that encourage developing new techniques 
for enhancing the accuracy under more factual conditions. 
However, the first works on HAR date back to the late 90’s. 

3. Challenges face HAR system designers 

Any HAR system design relies on the activities to be 
recognized. The activities kinds and complexity are able to 
affect the quality of the recognition. Some of challenges 
which face researches are (1] how to select the attributes to 
be measured, (2] how constructing the system with portable, 
unobtrusive, and inexpensive data acquisition, (3] how 
extracting the features and designing the inference methods, 
(4] how collecting the data in the real environment, (5] how 
recognizing activities of the new users without the need of 
re-training the system, and (6] how can be implemented in 
the mobile devices which meeting energy and processing 
limitations. 

4. Offline versus online HAR systems 

The recognition of human activity could be done using 
offline or online techniques. Whenever online processing is 
not necessary for the application, the offline processing can 
always be used. For example, if the tracking of person’s daily 
routine is the goal such as in, the data was collected during 
the day by using the sensors and then it could be uploaded to 
a server at the end of the day. The data can be processed 
offline for classification purposes only. 

However, some of the applications such as fitness coach 
where the user applies the given program which contains on 
a set of activities with sequence and duration. It is widely 
required to identify what the user is currently doing; 
therefore it requires using online technique. 

Another application can be the recruitment for participatory 
sensing applications. For instance, the application aimed to 


collect the information from users during walking in a 
specific location in the city. Thus, online recognition of 
activities becomes significant. Some researches on human 
activities, which work on offline recognition, are using 
machine learning tools such as WEKA. Nowadays, some of 
clouding systems are being used for online recognition. 

5. Data collection 

In this paper, we have uses a standard HAR dataset which is 
publicly available from the W1SDM group. Android smart 
phone based application was used to collect data. Each user 
was asked to take the smart phone in a front leg pocket and 
performed five different activities in supervised condition 
which were walking, jogging, walking upstairs, walking 
downstairs, sitting, and standing. While performing these 
activities, the sampling rate for accelerometer sensor was 
kept of 20Hz. W1SDM HAR dataset consists the 
accelerometer's raw time series data and detail descriptions 
are shown in the Table 1. 


Description 

Nos. of 
Record 

%of 

Records 

Total Nos of Samples 

10,98.207 

100% 

Nos of Attributes 

6 


Any missing value 

None 


Ativity wise distribution 

Total nos. 
of Samples 

Percentage 

Walk 

4.24,400 

38.6% 

Joq 

3.42.177 

31.2% 

Up-stairs 

1,22,869 

11.2% 

Down-staiis 

1.00,427 

9.1% 

Sit 

59,939 

5 5% 

Stand 

48,395 

4 4% 

Transformed bxampies 

Total Nos of samples 

5,424 


Nos of attributes 

46 


Anv missing value 

None 


Activity wise distribution 

Total nos. 
of samples 

Percentage 

Walk 

2,082 

38.4% 

Jog 

1,626 

30.0% 

Up-staiis 

633 

11.7% 

Down-stairs 

529 

9.8% 

Sit 

307 

5.7% 

Stand 

247 

4.6% 


5.1. Feature generation 

Before applying the classifier algorithm, it is necessary to 
transform the raw sensor's data. The raw accelerometer's 
signal consists of a value related each of the three axes. To 
accomplish this J. R. Kwapisz et al has segmented into 10- 
second data without overlapping. This is because he 
considered that lOseconds data consist of sufficient 
recreations that consist of 200 readings. Then they have 
generated features that were based each segment data of 
200 raw accelerometer readings. A total 43 features are 
generated. All these are variants are based on six extraction 
methods. Average, Standard Deviation, Average Absolute 
Difference and Time between Peaks for each axis are 
extracted. Apart from these Average Resultant Acceleration 
and Binned Distribution is also extracted. 

5.2. Classification 

In this paper for classification of human activity of daily 
living, we have used the classifiers available in the Weka 
tool. In this paper, we have presented selected classifier 
algorithms like Decision Stump, Hoeffding Tree, Random 
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Tree, REP Tree, J48 and RAndom Forest, decision tree 
algorithms along with Adaptive Boosting available in Weka 
Adaboost.Ml with default setting. 

5.3. Performance measure 

During this experimentation following performance 
measures has been used. The Overall accuracy is used to 
summarize the overall classification performance for all 
classes. It is defined as follows: 

> Overall accuracy=TP/ (TP+FP+FN+TN] 

> Precision=TP/ [TP+FP] 

> Recall=TP/ (TP+FN] 

> Specificity=TN/ (TN+FP] 

6. Experimental results 

The experiments are performed by the following steps. 

> Acquisition of standard WISDM HAR Dataset for Human 
Activity Recognition through a mobile device which is 
available in public domain. 

> Partitioning dataset into training, testing and cross 
validation by using 10-fold cross-validation. 

> A Selection of Meta Adaboost.Ml classifier for 
classification with selected decision tree classifier with 
default parameters. 

> Examination of each classification model on 10-fold 
cross validation. 

> Comparative analysis on the basis of performance 
measures such as, classification accuracy, TP rate, FP 
rate, minimum 

> RMSE, F-measure, precision, recall and ROC. 

> We used experiment environment from weka in 
determining mean and standard deviation performance 
of a classification algorithm on a WISDM dataset. 

> We choose decision tree classifiers, experiment type has 
been chosen as 10-fold cross-validation in which WISDM 
dataset is divided into 10 parts [folds] and compare 
their results with meta classifier Adaptive Boosting. The 
confidence kept at 0.05. 

Finally, we used weka experimenter to evaluate the 
performance of the classifiers mentioned in an earlier 
section on standard WISDM dataset. Each classifier is trained 
and tested using 10-fold cross validation with 10 times' 
repetition. 

6.1. Confusion matrix for classifiers 

The Confusion Matrix for Decision Stump, Hoeffding Tree, 
Random Tree, REP Tree, J48 and Random Forest are shown 
in the Table 2 to Table 7. 


Table2. Confusion Matrix for Adaboost.Ml Meta 
Classifier with Decision Stump 


classified as 

a 

b 

c 

d 

e 

f 

a^Walking 

2014 

67 

0 

0 

0 

0 

Hogging 

185 

1440 

0 

0 

0 

0 

^Upstairs 

588 

44 

0 

0 

0 

0 

d=Downstairs 

519 

9 

0 

0 

0 

0 

Hitting 

306 

0 

0 

0 

0 

0 

Handing 

246 

0 

0 

0 

0 

0 


Table3. Confusion Matrix for Adaboost.Ml Meta 
_ Classifier with Hoeffding Tree _ 


classified as 

a 

6 

c 

d 

e 

f 

a=WalMng 

1863 

89 

67 

42 

5 

15 

Hogging 

53 

1520 

38 

4 

0 

10 

HJpstairs 

346 

46 

115 

94 

3 

28 

d=Downstaiis 

327 

11 

61 

109 

1 

19 

emitting 

0 

0 

1 

0 

288 

17 

f=Standing 

0 

0 

19 

3 

25 

199 


Table4. Confusion Matrix for Adaboost.Ml Meta 
Classifier with Random Tree 


classified as 

a 

b 

c 

d 

e 

f 

a=Waliing 

2042 

5 

18 

14 

0 

2 

Hogging 

10 

1601 

6 

6 

1 

1 

c=Upstairs 

27 

19 

501 

80 

4 

1 

d=Downstairs 

36 

9 

119 

360 

2 

2 

emitting 

1 

0 

2 

1 

299 

3 

f=Standing 

1 

2 

5 

2 

6 

230 


Table5. Confusion Matrix for Adaboost.Ml Meta 
Classifier with REP Tree 


classified as 

a 

b 

c 

d 

e 


a=Walking 

2065 

9 

3 

3 

i 

0 

b=Jogging 

32 

1575 

10 

8 

0 

0 

c=Upstairs 

4 

8 

500 

120 

0 

0 

d=Downstairs 

9 

6 

112 

401 

0 

0 

e=Sitting 

2 

1 

6 

3 

292 

2 

Standing 

5 

2 

6 

9 

2 

222 


Table6. Confusion Matrix for Adaboost.Ml Meta 
_ Classifier with J48 _ 


classified as 

a 

b 

c 

d 

e 


a=Walking 

2019 

11 

23 

26 

2 

0 

w< 

10 

1585 

16 

14 

0 

0 

c=Upstairs 

39 

30 

445 

115 

2 

1 

d=Downstairs 

34 

16 

83 

390 

4 

1 

e=Sitting 

2 

0 

3 

1 

297 

3 

f=Slanding 

0 

2 

6 

1 

2 

235 


Table7. Confusion Matrix for Adaboost.Ml Meta 


Classifier with Random Forest 


classified as 

a 

b 

c 

d 

e 

f 

a=WaDdng 

2048 

2 

15 

14 

0 

2 

Hogging 

2 

1605 

11 

6 

1 

0 

c=Upstairs 

15 

13 

516 

85 

2 

1 

d=Downstairs 

29 

7 

95 

393 

3 

1 

emitting 

0 

0 

2 

0 

302 

2 

^Standing 

1 

0 

4 

1 

0 

240 


As shown a confusion matrix in the Table- 2 and 
performance criteria in table 8 for Decision Stump, the 
classifier found confused over the Jogging stairs standing 
and Laying Down. It is found that there is common 
misclassification of the stairs and sitting with walking has 
been observed. But still the performance of the REP Tree, J49 
and Random Forest is much better compared with others. 
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6.2. Performance criteria for classifiers 

The performance criteria for classifiers are as shown in 
Table 8 to Table 13. 


Table8. Performance Criteria for Adaboost.Ml Meta 
Classifier with Decision Stump 



Tablell. Performance Criteria for Adaboost.Ml Meta 
Classifier with REP Tree 


Class 

IP Rate 

FPRate 

taioo 

Reel 

F-Measure 

ICC 

ROC Area 

PRC .Area 

Walking 

M 

Ml 

w 

« 

» 

ffl 

m 

M 

Jogging 

«9 

If! 

i# 

it 

19# 

1.® 

i# 

H 

Upstairs 

H 

10 

i* 

IS 

11 

B 

m 

1!I 

Downstairs 

ffl 

10 

m 

ra 

M 

1.111 

m 

1111 

Sitting 

m 

U 

19 

is 

m 

1.91 

191 

1911 

Standing 

19ffi 

1 

191 

i* 

1.94! 

IMS 

19# 

0.915 

Weighted Avg. 

19!! 

1)14 

ffl 

88 

tffi 

1.91 

« 

1954 


Tablel2. Performance Criteria for Adaboost.Ml Meta 
___ Classifier with J48 ___ 


Gass 

IP Rate 

FPRate 

decision 

Reel 

F-Measuie 

MCC 

ROC Area 

PRC Area 

Walking 

191 

IB 

B 

0.97 

m 

IMS 

w 

19! 

Jogging 

1.9IJ 

Hi 

ffl 

« 

191 

0.957 

19! 

1* 

Upstairs 

ffl 

IB 

m 

» 

im 

11 

ffl 

10 

Jownstairs 

n 

IK 

111! 

1139 

IB 

0.696 

W 

ffl 

Sitting 

1911 

!« 

)» 

1911 

B) 

ttl 

191 

1911 

Standing 

m 

1* 

191) 

1955 

1® 

l* 

I9i 

«S 

Weighted Avg. 

mi 

till 

Hi 

1)1! 

1911 

IB 

1)19 

ffl 


Tablel3. Performance Criteria for Adaboost.Ml Meta 
Classifier with Random Forest 


Class 

IF fan 

FPRate 

Precision 

Ml 

Pleasure 

MCC 

ROC Area 

PC Asa 

Waling 

1.911 

1114 

1.911 

1)14 

Ml 

m 

Ml 

1.991 

lOR 

Ml 

IK 

1* 

Mi 

Ml 

m 

1 

1.999 

Upstairs 

Hi 

if] 

111 

Dili 

11 

1.114 

Mi 

l.i 

Downstairs 

ffl 

IE 

1.711 

ffl 

ffl 

1,141 

1)14 

1.11! 

Sitting 

MI 

III 

1.911 

Ml 

1)14 

MS 

1 

1)99 

Standing 

11 

Ml 

1.916 

Mi 

1.911 

1,914 

1 

1,99! 

Weighted Avg. 

Ml 

IB 

1.941 

m 

1)41 

1.9! 

0.996 

Iff 


7. Conclusion 

This paper surveys the state-of-the-art in human activity 
recognition based on measured acceleration components. 
It can be concluded that the Random Forest, REP Tree and 
J48 algorithms which have a little “v" next to their results 
means that the difference in the accuracy of these algorithms 
compared to Decision Stump, and Hoeffding Tree is 
statistically significant. This paper also shows the accuracy 
of these algorithms compared to Decision Stump, and 
Hoeffding Tree is high, so it can be said that these two 
algorithms achieved a statistically significantly better result 
than the Decision Stump, and Hoeffding Tree and Random 
Tree baseline. 
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